Aggregation-Aware Top-k Computation for Full-Text Search
نویسندگان
چکیده
A typical scenario in information retrieval and web search is to index a given type of items (e.g., web pages, images) and provide search functionality for them. In such a scenario, the basic units of indexing and retrieval are the same. Extensive study has been done for efficient top-k computation in such settings. This paper studies top-k processing for many emerging scenarios: efficiently retrieving top-k items of one type based on the inverted index of another type of items. It would be very inefficient by directly utilizing traditional top-k approaches. Here we follow TA (the Threshold Algorithm) in this scenario. We present an aggregationaware top-k computation framework with three pruning principles upon the conventional inverted index and a novel inverted index type HybridRank, which employs the item information of both types. Experimental results show that our proposed new index structure and the aggregation-aware top-k strategy provide an efficient solution for this aggregation-aware top-k problem.
منابع مشابه
TopX: efficient and versatile top-k query processing for text, structured, and semistructured data
TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conference...
متن کاملRWTH Aachen University , I 5 Max - Planck - Institut für Informatik , AG 5 Holistic Top - k
Querying large data sets is a challenging task in today’s information systems. Users are typically interested in the k most relevant results, namely the first page (e.g., the Google search engine) of the given result set. That is, given a dataset D, and user defined similarity function f, we are interested in calculating the top-k , i.e., the k highest ranked results (answers). Finding the top-...
متن کاملSearch for the Best but Expect the Worst - Distributed Top-k Queries over Decreasing Aggregated Scores
We consider distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. In contrast to existing work, we exclusively consider distributed top-k queries over decreasing aggregated values. State-of-the-art distributed top-k algorithms usually depend on threshold propagation to reduce expen...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کامل